Scale adaptive filtering

ABSTRACT

A scale adaptive filtering scheme is developed for underspread channels based on a model of the linear time varying channel operator as a process in scale. Recursions serve the purpose of adding detail to the filter estimate until a suitable measure of fidelity and complexity is achieved.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to processing methods for adaptive filtering on a source signal for transmission, and more particularly an adaptive filtering method to identify the relationship between a source signal and a received signal.

2. Description of the Related Art

It will be readily appreciated by one skilled in the art that the use of adaptive filtering operations may be required in a variety of different signal processing applications. One example of such an area is that of acoustics and specifically sonar signaling. Adaptive filtering systems for time varying operators have broad applications. For example, echo cancellation, active and passive sonar, and equalization of communication signals employ adaptive filtering modeling regimes for extracting information from signals. These modeling regimes are often limited by available memory, computational resources, and acceptable processing delays.

Multiresolution models of time varying operators have been promoted with the idea of exploiting the sparsity or “economy” of representation attendant in them. This is a useful concept since many operators that are not shift invariant are often succinctly described in wavelet bases, the basic building block of multiresolution models. Conventional approaches have given a general framework for formulating the multiresolution filter structures and provided a fast least mean square-like estimators with an assumed maximal scale of representation. However, this assumption is often overly optimistic. In which case the maximal scale of representation or the most economical wavelet bases can be jointly estimated.

With regards to adaptive filtering, multiresolution models naturally allow estimation of the response at a given time instant based on given data. In addition, the model circumvents the need for locally stationary assumptions of time recursive algorithms. Accordingly, a larger amount of information is available for the estimation problem at each time instant. Unlike in-time estimation algorithms that are by their very nature causal or near causal, dependencies across time in the forward looking direction are not accounted for with classic in-time adaptive filtering algorithms. In-time estimation is a powerful paradigm due to its computational efficiencies and memory requirements and it continues to find broad appeal and enjoys superior performance in a variety of applications. Nevertheless in areas where fading and multipath delay do not conform to the wide sense stationary (WSS) assumption the multiresolution model is a viable solution. Other practical considerations related to using multiresolution modeling include signal processing and communication schemes involving strategies based on finite duration signaling. For example, mobile radio and underwater acoustic communications data employ a sequence of finite duration packets to transmit information. Similarly for underwater target localization by active sonar as well as radar applications source signals employ time localized “pings.”

Accordingly, a need exists for a scale adaptive filtering method that is able to match a received signal based on information from a source signal by estimating a variety of parameters. A channel operator is built up in scale, and the channel operator employs time delays and frequency spreads. For this purpose, each additional incrementation of the Doppler spread is hypothesized and then tested.

BRIEF SUMMARY OF THE INVENTION

An aspect of the present invention provides a scale adaptive filtering method to produce a match of the received signal based on the source signal, by estimating the channel operator at each scale along with nuisance parameters, for example.

This and other aspects are substantially achieved by providing a system and method for scale adaptive filtering to identify a relationship between a source signal and an actual received signal. The method includes calculating a scale wherein the scale includes a time delay spread and a frequency spread, estimating at least one wavelet coefficient, determining an estimated received signal at the scale using the wavelet coefficient, determining when the estimated received signal is sufficiently similar with respect to the actual received signal, and recalculating the scale by a factor when the estimated received signal is not sufficiently similar with respect to the actual received signal.

The above and still further aspects, features and advantages of the present invention will become apparent upon consideration of the following definitions, descriptions and descriptive figures of specific embodiments thereof, wherein like reference numerals in the various figures are utilized to designate like components. While these descriptions go into specific details of the invention, it should be understood that variations may and do exist and would be apparent to those skilled in the art based on the descriptions herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart representing a scale adaptive filtering method in accordance with an aspect of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

While the invention will be described in detail with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof. Accordingly, it is intended that the present invention covers modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Scale adaptive filtering is a general estimation method employed to estimate the relation between a received signal and a source signal in a variety of transmission environments. As is well known in the art, signaling information can be lost or distorted between a source signal and the received the signal.

The present invention is a novel method of applying transform function analysis to the received signal at a specific scale thus forming an estimated received signal. This estimated received signal is then compared to the actual received signal. If the estimated received signal is sufficiently complex to model the actual received signal, then the transform analysis has been successful. However, if the estimated signal is not sufficiently complex to model the actual received signal, then the scale is doubled and the transform analysis is applied again. This process is repeated, iteratively, until the estimated received signal is sufficiently complex to model the actual received signal. Sufficiency is achieved when a suitable measure of error is minimized without increasing the complexity of the channel operator. The complexity is limited such that the noise is not included in the modeled signal. In addition, achieving sufficiently similar signals is also determined by employing the Empirical Bayes estimator (a general statistical method), wherein the estimator provides a particular cost function.

The flow chart 10 illustrates the operation of an exemplary method of scale adaptive filtering in accordance with the present invention. Block 15 is representative of the start block. Block 20 includes generating a source signal from a source such as a transmitter on a submarine or other underwater vehicle, and receiving an actual received signal at a receiver. Block 25 is descriptive of the step including calculating a minimal scale, such that the scale includes time delay information and frequency spread information.

Block 30 is descriptive of the step of estimating a wavelet coefficient to be used in a transform analysis to determine an estimated received signal. In wavelet analysis it is helpful to view a signal in terms of a fully scalable modulated window. The window is shifted along the signal and for every position the spectrum is calculated. This process is repeated many times with a slightly shorter (or longer) window for every iteration. In the end the result is a collection of time-frequency representations of the signal. In the case of wavelets, we normally do not speak about time-frequency representations but about time-scale representations, scale being a means of quantifying the time variation in the channel operator.

Block 35 is descriptive of the step of determining an estimated received signal at the minimal scale, initially chosen in Block 20 and employing the wavelet coefficient of Block 30.

Block 40 is descriptive of the step of determining if the estimated received signal is sufficiently similar to the actual received signal. In other words, a sufficiently similar signal is achieved when a suitable measure of error is minimized without increasing the complexity of the channel operator. This takes into account the amount of noise in the signal information. The noise or nuisance parameters in the actual received signal should not be modeled in the estimated received signal. If the estimated received signal is not sufficiently similar, then the method proceeds to Block 45 and the scale is increased by doubling the minimal scale and recalculating the wavelet coefficient by proceeding through Blocks 30. Block 40 checks the estimated received signal against the actual received signal. If the two signals are sufficiently similar, according to the parameters outlined above, the method proceeds to Block 50 and Ends.

The flow chart 10 is an overview of the method described by the present invention. The following is an exemplary embodiment of this system and method. Specifically, when the maximal scale/frequency of representation of the most economical wavelet coefficients can not be assumed they are jointly estimated. Accordingly, the present invention exploits sparsity of representation under a priori uncertainty regarding Doppler spread and degree of sparsity of the operator.

Overall, the present invention employs three estimators. First, the wavelet coefficients of the channel response based on the flexible and adaptable Gaussian mixture model. Second, the estimation of the_minimal_scale of the multiresolution space in which the operator resides. Finally, an estimation of the nuisance parameters associated with the mixture model.

It is shown that the estimation of the maximal scale synonymous with the Doppler spread, is akin to model order selection and estimators for this parameter are provided. These channel estimators are recursive in scale each relying on the preceeding lower scale estimate as a starting point for the next higher resolution estimate. In this framework the channel is gradually built up in scale, rather than time, with each addition of detail a greater Doppler spread (decreased coherence time) is hypothesized and is tested against the data in a model selection framework. Thus multiple resolutions of the operator are provided in sequence with finer scale details descending from the lower scale estimates starting with the linear time invariant (LTI). The algorithm as presented is limited to filter operators that are underspread (i.e. Bτ_(max)<1 where τ_(max) is the delay spread of the filter, B the maximal Doppler bandwidth associated with any of the scatterers, this requirement excludes the underdetermined parameter estimation case).

The following provides preliminary notation to address the time varying channel as well as an introduction to the empirical Bayes approach. We review the multiresolution decomposition of a time varying operator and the in-scale maximum likelihood estimator is presented based on the conjugate gradient algorithm for in-scale recursions. We go on to introduce a suitable sparsity prior for the time varying channel and this leads to the posterior mean and variance along with estimators of the nuisance parameters associated with the sparsity model. Lastly, we present Doppler spread estimates and stopping rules based on the maximum a posterior (MAP) criteria and on Laplace's approximation as well as the simple and effective Akaike information criteria (AIC). Finally, the usefulness of these algorithms is demonstrated on a diverse set of simulated channels with various types and degrees of Doppler spread. Improved performance of the empirical Bayes posterior mean estimator over the maximum likelihood estimator is demonstrated.

The following definitions are provided for ease of reference, and will be used throughout. Without loss of generality all matrices and vectors are assumed real. Denote the M×N matrix A by A_(M×N) and the N×1 vector {right arrow over (a)} by {right arrow over (a)}_(N). Let {right arrow over (1)}_(N) denote a column vector of 1's, {right arrow over (0)}_(N) denote a column vector of 0's and I_(N) denote the identity matrix of size N by N.

Definitions:

-   -   1. For vectors {right arrow over (a)}_(N) define the diagonal         matrix with diagonal elements {right arrow over (a)}_(N) by         Diag({right arrow over (a)}). For square matrices A_(N,N), the         vector of diagonal elements of the matrix A by diag(A).     -   2. Stack operator, [•]^(st) acts on a matrix A_(M×N)=[{right         arrow over (a)}₁, {right arrow over (a)}₂, . . . {right arrow         over (a)}_(N)] yielding [A]^(st)=[{right arrow over (a)}₁′, . .         . {right arrow over (a)}_(N)′]′, a stacking of A's component         columns into a supercolumn with inverse denoted by [•]^(ist)         such that A=[A^(st)]^(ist).     -   3. Kronecker product A_(M×N){circle around (×)}B_(L×K) is the L         M×N K matrix of weighted B blocks, having {m,n}^(th) block entry         a_(m,n)B.     -   4. Elementwise product, (A_(M×N){circle around         (·)}B_(M×N))_(m,n)=a_(m,n)b_(m,n).     -   5. Vector square root,

$\left( \sqrt{\overset{\rightarrow}{a}} \right)_{n} = {\sqrt{a_{n}}.}$

-   -   6. Matrix operator {hacek over (H)}, associated with the matrix         array H_(M×N) with n^(th) column H_(n), has N+M−1 rows and N         columns; the n^(th) column is {hacek over (H)}_(n)=[{right arrow         over (0)}_(n−1), H_(n), {right arrow over (0)}_(N−n)].         Paragraphs 58 through 61 list a few useful properties and         identities that follow directly from these definitions.

The empirical Bayes (EB) approach is useful primarily for its computational efficiency. Consider computation of E[h|r], var[h|r] and E[φ|r] where r represents received data, h a parameter set of interest and φ a set of nuisance parameters. Expectations over the joint density p(h, φ|r) are often not easily computed. EB methods leverage a crude assumption p(φ|r)≈δ(E[φ|r]) to approximate expectations via iteration E[h|r]^(l)≈E[h|r, φ=E[φ|r]^(l−1)] and are synonymous with approximations of the posterior marginal density via p[h|r]≈p[h|r, φ=E[φ|r]]. The method is versatile with other suitable estimators in place of {circumflex over (φ)}=E[φ|r]^(l−1) where their computation is faster or warranted by knowledge of the distribution. The penalty of this approximation is its underestimation of variance since var[h|r]=E_(φ)[var[h|φ, r]]+var_(φ)[E[h|φ, r]]. EB approaches therefore provide a lower bound, var[h|r]≧E_(φ)[var[h|r, φ]] that is useful when var_(φ)[E[h|φ, r]] is relatively small. For adaptive filtering problems where the computational demands of Bayesian analysis are presently out of reach the approach has merit.

Define two multiresolution (M R) spaces one over time and the other over delay. Let {Ω_(j) ^(ξ)}_(j∈Z) and {Ω_(k) ^(ψ)}_(k∈Z) represent these respectively; thus Ω_(J) ^(ξ)={Σ_(j=0) ^(J)Σ_(l∈Z)c_(j,l)ξ_(j,l)(t) ∥{right arrow over (c)}∥₂<∞} where ξ_(0,0)(t)=Σf_(l) ^(ξ)ξ_(0,0)(2t−l) is the scaling function associated with this MR space, and the lowpass f^(ξ) and highpass g^(ξ) are a QMF pair. The scale index j increases with increasing bandwidth in agreement with the convention on the wavelets that ξ_(j,l)(t)=√{square root over (2^(j−1))}ξ_(1,0)(2^(j−1)t−l) and ξ_(1,0)(t)=Σg_(l) ^(ξ)ξ_(0,0)(2t−l).

Following the filter model of Doroslovacki, H_(B)(t, τ)∈Ω_(J) ^(ξ)×Ω_(K) ^(ψ) implies

$\begin{matrix} {{{H_{B}\left( {t,\tau} \right)} = {\sum\limits_{j,k}^{J,K}{\sum\limits_{l,m}^{L_{j},M_{k}}{w_{j,k,l,m}{\xi_{j,l}(t)}{\psi_{k,m}(\tau)}}}}}{\begin{matrix} {L_{j} = {{BT}\; 2^{j - J - 1}}} & {M_{k} = {F\;\tau_{\max}}} \end{matrix}2^{k - K - 1}}} & (1) \end{matrix}$ has approximate Doppler spread B∝2^(J)/T and represents the time varying filter perfectly on the domain (t, τ)=(0, T)×(0, τ_(max)). The model (1) implicitly specifies the maximal Doppler spread and is therefore denoted with a subscript B.

Frequency selectivity in the band [2^(K−k−1), 2^(K−k)]F is modeled by the Fτ_(max)2^(k−K) basis functions {ψ_(k,m)}_(k,m) at delay shifts 2^(K−k+1)m/F. In this way the frequency selectivity of the channel due to diverse scatterer locations and delay spreads is modeled as a superpostion of wavelet bases.

For moving scatterers Doppler spread is induced and is synonymous with an effective channel modulation over time corresponding to an imparted bandwidth to the impulse response process over time at any given delay. Modulations at time t≈2^(−j−1)l T with durations ∝T/2^(j) are modeled with the basis functions {ξ_(j,l)(t)}_(j,l). These modulations correspond to each scatterers motion such that scatterers corresponding to greater accelerations will yield a channel operator with projections onto bases at fine scales (i.e. large J).

Let H_(B)(n, m)=H(n/2^(q)B, m/F) q>1 be a suitably sampled in time t and delay τ version of H(t, τ). Representing this 2-D array as a single stacked column {right arrow over (h)}_(B) of time invariant filters via paragraph 21, definition 2 each operating to produce F/2^(q)B samples of the output, express the wavelet coefficients of H(t, τ) (nonzero up to scale J) by {right arrow over (w)} _(J) =U _(J)(D _(ξ) ^(−q) {circle around (×)}I ₂ _(k) ){right arrow over (h)} _(B)

W _(J)=Ψ′_(K) H _(B) D _(ξ) ^(q)Ξ_(J).  (2) The matrix D_(ξ) ^(q) is the 2^(J+q) by 2^(J) matrix decimation operator associated with the M R space of ξ. Its adjoint D_(ξ) ^(q′)=D_(ξ) ^(−q) is the associated interpolation operator. The columns of Ξ_(J) and Ψ_(K) are respectively the expansion coefficients of the wavelets ξ_(j,l)(t) and ψ_(k,m)(τ) in the bases of the associated scaling functions ξ_(0,0)(t) and ψ_(0,0)(τ) respectfully. The matrix Ξ_(J)∈R^(BT×BT) and Ψ_(K)∈R^(Fτ) ^(max) ^(×Fτ) ^(max) have maximal scales of J=└log₂ B T┘ and K =└ log₂ Fτ_(max)┘ respectively. The operator U_(J) is the Kronecker product of the two wavelet transforms U_(J)=Ξ′_(J){circle around (×)}Ψ′_(K). The wavelet coefficients {right arrow over (w)}_(J)={w_(j,k,l,m), j=1 . . . J, k=1 . . . . K, l, m∈Z²} are computable via the fast wavelet transform. The operator H is synthesized from {right arrow over (w)}_(J) with the fast inverse wavelet transform to J+q scales (q interpolation/scaling operators added via zero appending W_(J)) {right arrow over (h)}_(B)=(D _(ξ) ^(q) {circle around (×)}I ₂ _(K) )U′ _(J) {right arrow over (w)} _(J)

H _(B)=Ψ_(K) W _(J)Ξ′_(J) D _(ξ) ^(−q)  (3) implying that H_(B) is synthesized as constant at time scales less than 1/2^(q)B. Clearly then q must satisfy q<log₂ F/B since the channel is LTI at the sampling rate. The LHS of (3) follows from paragraph 58, property 4, with Ξ and Ψ unitary. Compute {right arrow over (h)}_(B) from {right arrow over (w)}_(J) via the RHS of (3) and denote the scaling function expansion of the channel by {right arrow over (h)}_(B) ⁰=U′_(J){right arrow over (w)}_(J).  (4) Lastly define the time varying convolution matrix operator associated with the time varying array structure H_(B) or synonymously its stacked representation {right arrow over (h)}_(B) as {hacek over (H)}_(B).

For channels with sparse arrivals that are dispersed and non-stationary the wavelet model offers a represention that is parsimonious. However for filters with peaky features in frequency, Fourier bases or autoregressive models will be more suitable. Fourier bases modeling Doppler spread will be optimal as well for the WSS channel as T→∞. Fourier bases will fail however to capture discontinuities or time localized phenomena. In addition where finite duration segments must be modeled boundaries pose severe penalties to Fourier methods, wavelets on the other hand parsimoniously capture these features. At lower frequencies associated with scatterers with smaller Doppler spread wavelets will offer near Fourier-like performance for the WSS condition. Nevertheless wavelets provide a means to model abrupt changes in channel conditions, localized phenomena in time and delay as well as minimizing boundry effects associated with finite duration signaling. Wavelet models of course are not universally appropriate for delay localized operators. For instance in the case of channels consisting of arrivals that have no dispersion in delay the standard Euclidean bases (in delay) are optimal (i.e. H(t, τ)=Σ_(n)α_(n)(t)δ_(τ−τ) _(n) _((t))). In this case the path delays and amplitudes (associated with the time changing geometry of the environment) are modeled.

The following paragraphs introduce an in-scale likelihood model from which a recursive in-scale MLE follows. A sparsity prior is then assumed and from this in-scale posterior expectation, variance and Doppler spread B estimates are derived. Empirical Bayes methods are employed throughout.

The channel response H(t, τ) of (1) with B Hz Doppler spread is observed via the source, s(t) and received signal r(t) through the linear model r(t)=∫H(t, τ)s(t−τ)dτ+n(t)  (5) where n(t) is a white Gaussian noise process of known power σ²F.

Let N_(x)(μ, Σ)=√{square root over ((2π)^(−N)|Σ|⁻¹)}exp(−(x−μ)′Σ⁻¹(x−μ)/2). Assuming H is time-invariant over durations 1/2^(q)B and that s(t) and r(t) are suitably sampled at the Nyquist rate F let N=F(T+τ_(max)) the received signal dimension and M=Fτ_(max) the delay spread dimension. The discrete time model associated with (5) using the time varying operator of paragraph 21, definition 6, is p({right arrow over (r)}|{right arrow over (h)} _(B) ,{right arrow over (s)},σ)=N _({right arrow over (r)})({hacek over (H)} _(B) {right arrow over (s)},Iσ ²)  (6) where {hacek over (H)}_(B) represents the N by N−M+1 block convolution operator associated with {right arrow over (h)}_(B) in (3). Synonymously using (3) express (6) as a set of stacked wavelet coefficients p({right arrow over (r)}|{right arrow over (w)} _(J) ,{right arrow over (s)},σ)=N _({right arrow over (r)})(S ₂ _(q) _(B)(D _(ξ) ^(q) {circle around (×)}I ₂ _(K) )U′ _(J) {right arrow over (w)} _(J) , Iσ ²)  (7) where J is interchangeable with B as in (3) and for notational simplicity the conditioning on families ξ and ψ are assumed for time and delay respectively. With P=BT the Doppler dimension, S₂ _(q) _(B) is the N×2^(q)P M block convolution operator associated with the sampled source signal {right arrow over (s)}. The blocks of S₂ _(q) _(B) are proportional to the coherence time of 1/2^(q)B over which the channel is LTI. S₂ _(q) _(B) operates on a 2^(q)P long list of stacked M length LTI channel vectors to yield a channel output. To illustrate consider q=0 then S_(B) has the form

$\begin{matrix} {S_{B} = \begin{bmatrix} L_{1} & 0 & \vdots & \vdots & \vdots & 0 \\ X_{1} & 0 & \; & \; & \; & \vdots \\ Y_{1} & L_{2} & \; & \; & \; & \vdots \\ 0 & X_{2} & \; & \; & \; & \vdots \\ 0 & Y_{2} & \bullet & \; & \; & \vdots \\ \vdots & 0 & \; & \; & \; & \; \\ \vdots & \; & \; & \bullet & \; & 0 \\ \vdots & \; & \; & \; & \; & 0 \\ \vdots & \; & \; & \; & \bullet & L_{P} \\ \; & \; & \; & \; & 0 & X_{P} \\ 0 & \vdots & \vdots & \vdots & 0 & Y_{P} \end{bmatrix}} & (8) \end{matrix}$ The L's (Y's) are lower (upper) triangular, and are each of size M by M. The X's are full and of size F/B−M by M. Together each [L′, X′, Y′]′ block represents a time invariant filter operator over a duration of 1/2^(q)B seconds derived from the source signal of this block. The operator (D_(ξ) ^(q){circle around (×)}I₂ _(K) ) maps the N by 2^(K+J+q) block convolution operator S₂ _(q) _(B) of source samples, to the N by 2^(K+J) block convolution operator of modulated blocks associated with coefficients in a scaling expansion of ξ. Some useful properties are exposed by considering the case of q large. Define the limiting matrix S _(B) ^(ξ)=lim_(q→∞) S ₂ _(q) _(B)(D _(ξ) ^(q) {circle around (×)}I ₂ _(K) )  (9) which operates on the stacked coefficients of a scaling function expansion of the channel (i.e. {right arrow over (h)}_(B) ⁰=U′_(J){right arrow over (w)}_(J)) to produce the expected value of the output of the time varying filter according to (7). The following scaling equation for such block convolution operators is informative S_(B/2) ^(ξ)=S_(B) ^(ξ)(D_(ξ) ¹{circle around (×)}I₂ _(K) ). Further observe that for ξ Haar the scaling equation for the operator S_(B) takes the form S _(B) =S _(2B)(D _(Haar) ¹ {circle around (×)}I ₂ _(K) )  (10) and while it requires no multiplies it exhibits abrupt discontinuities between blocks. In practice (9) is not computable, the upper bound on q is determined by the bandwidth of the source q<log₂F/B such that modeling of the channel as anything but LTI over durations smaller than the sampling period is infeasible and unnecessary. For channels with reasonable regularity q>3 is sufficient in practice. For instance with q=5, a B Hz Doppler spread channel is modeled as LTI over durations <1/32 B sec.

Returning back to (7) the implied likelihood function on

${\overset{\rightarrow}{w}}_{J} = {U_{J}{\overset{\rightarrow}{h}}_{B}^{0}}$ is

$\begin{matrix} {{{p\left( {\left. \overset{\rightarrow}{r} \middle| \overset{\rightarrow}{w} \right.,\overset{\rightarrow}{s},{J = {\log_{2}{BT}}}} \right)} \propto {{??}_{{\overset{\sim}{w}}_{J}}\left( {{\overset{\sim}{\overset{\rightarrow}{w}}}_{J},\Delta_{J}} \right)}}\begin{matrix} {{{\overset{\sim}{\overset{\rightarrow}{w}}}_{J} = {U_{J}{\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0}}},} & {{{\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0} = {\left( C_{B}^{\xi} \right)^{- 1}S_{B}^{\xi^{\prime}}\overset{\rightarrow}{r}}},} & {\Delta_{J} = {\sigma^{2}U_{J}C_{B}^{\xi}U_{J}^{\prime}}} \end{matrix}} & (11) \end{matrix}$ where the 2^(K+J)×2^(K+J) matrix C_(B) ^(ξ)=S_(B) ^(ξ)S_(B) ^(ξ) is the source covariance matrix associated with the time-varying filter model. To illustrate consider again the simple case q=0 and ξ Haar. It follows that C_(B)=S′_(B)S_(B) and takes the form

$\begin{matrix} {C_{B} = \begin{bmatrix} T_{1} & V_{1} & 0 & \; & \ldots & 0 \\ V_{1}^{\prime} & T_{2} & \; & \; & ⋰ & \vdots \\ 0 & \; & \bullet & \; & \; & 0 \\ 0 & \; & \; & \bullet & \; & 0 \\ \vdots & ⋰ & \; & \; & T_{P - 1} & V_{P - 1} \\ 0 & \ldots & \; & 0 & V_{P - 1}^{,} & T_{P} \end{bmatrix}} & (12) \end{matrix}$ with T_(p) M×M Toeplitz, V_(p) lower triangular Hankel with zeros on the main diagonal having rank M−1 and P=BT. C_(B) is represented either by the 2P vectors of length M corresponding to the first columns of T_(p) and V_(p), or by the 2M+1 diagonals of C_(B) each being of length P M. Thus C_(B) is non-Toeplitz, banded Hermitian with 2M+1 nonzero diagonals. For ξ≠Haar C_(B) ^(ξ) will differ from C_(B) in that it will be banded with width >2M+1. The vector

${\overset{\rightarrow}{b}}_{B} = {S_{b}^{\xi^{t}}\overset{\rightarrow}{r}}$ in (11) is a block overlapped cross-correlation of source and received data vectors associated with the windowing scaling function ξ_(0,0)(t). The weakness of the in-scale likelihood approach is simply that the parameter set grows exponentially with scale, (i.e. PM∝2^(J)) a severe penalty for a linear model. Nevertheless for the very limited set of spectrally flat signals the approach has some merit computationally. Signals for which ∥C_(B) ^(ξ)−C₀I_(PM)∥ is small for some scalar C₀ (e.g. Gaussian white signals, maximal length sequences) ∥{right arrow over (b)}_(B)−{right arrow over (h)}_(B)∥ will be small as well and close approximations to {right arrow over (h)}_(B) are attained with the fast block correlation.

${\overset{\rightarrow}{b}}_{B} = {S_{B}^{\xi^{\prime}}{\overset{\rightarrow}{r}.}}$

The in-scale computation is done using the conjugate gradient method. A direct computation of

${\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0}$ in (11) requires O((PM)³) a computationally impractical feat. A feasible approach assumes that

${\overset{\sim}{\overset{\rightarrow}{h}}}_{B/2}^{0}$ is known to a tolerable precision. Compute

${\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0}$ starting from

$\left( {D_{\xi}^{1} \otimes I_{2^{K}}} \right){\overset{\sim}{\overset{\rightarrow}{h}}}_{B/2}^{0}$ via the conjugate gradient (CG) method. Initialization requires only O(M²) since for J=0 the system is Toeplitz. The CG algorithm has found wide application in solving banded and sparse systems. Indeed the CG algorithm has been shown to be synonymous with the multistage Wiener filter. Table 1 lists the operations necessary along with the associated computational demands. Note that only a single matrix multiplication by the sparse banded matrix C_(B) ^(ξ) is required per iteration. A multiply is accomplished efficiently from the 2M+1 stored diagonals of length P M. In this framework the inversion of large matrices is avoided by solving much lower (1/2) dimension problems to a given precision then using these as starting points for the next higher hypothesized dimension. The resulting maximum likelihood estimate when the CG algorithm is iterated to machine precision is termed

${\overset{\rightarrow}{h}}_{B}^{mle}$ and is computed via interpolation from

${\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0}$ as

${\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{mle} = {{\left( {D_{\xi}^{q} \otimes I_{2^{K}}} \right)\left. {\overset{\sim}{\overset{\rightarrow}{h}}}_{B}^{0}\Longleftrightarrow H_{B}^{mle} \right.} = {\left\lbrack {\overset{\rightarrow}{h}}_{B}^{0} \right\rbrack^{ist}{D_{\xi}^{- q}.}}}$

The CG algorithm also allows for flexibilty in the number of iterations. Computation of {right arrow over (h)}_(B) ^(mle) to machine precision may not be necessary and a solution slightly more coherent (closer to

$\left. {\overset{\rightarrow}{h}}_{B/2}^{mle} \right)$ may be tolerable. For less computation iterate to within a ball of radius proportional to the covariance of the MLE and call such a solution {right arrow over (h)}_(B) ^(a), an approximate MLE. To determine this iteration number an approximate ad-hoc approach is taken. Define κ the ratio of minimum to maximum eigenvalues of C_(B) ^(ξ) then at iteration i^(a), where

${\overset{\rightarrow}{h}}_{B}^{\{ i^{\alpha}\}} = {\overset{\rightarrow}{h}}_{B}^{\alpha}$ the CG solution is within a ball of radius

${{{\overset{\rightarrow}{h}}_{B}^{\alpha} - {\overset{\rightarrow}{h}}_{B}^{mle}}} = {{{\underset{i^{\alpha} + 1}{\sum\limits^{PM}}{\rho_{i}{\overset{\rightarrow}{d}}_{i}}}} \leq {2{{\overset{\rightarrow}{h}}_{B}^{mle}}\sqrt{\kappa}\omega^{i^{\alpha}}}}$ ${{where}\mspace{14mu}\omega} = {\left( {\sqrt{\kappa} - 1} \right)/{\left( {\sqrt{\kappa} + 1} \right).}}$ The computation of κ is not trivial and must be approximated for the class of source signals that are considered. Further in the case of in-scale computation proposed here the extreme eigenvalues of C_(B) ^(ξ) at each B would have to be approximated, at present this is impractical and an approximation is used for ω. The implementation of the CG algorithm presented makes an additional ad-hoc assumption to eliminate

${}{\overset{\rightarrow}{h}}_{B}^{mle}{}$ from the stopping criteria based on the fact that the CG algorithm gives fairly smooth exponential convergence due to dominant eigenvectors not dominating the convergence. For this reason in practice the tightness of the

$\leq {2{{\overset{\rightarrow}{h}}_{B}^{mle}}\sqrt{\kappa}\omega^{i^{\alpha}}}$ bound does not vary greatly from iteration to iteration. Express the radius as

${{\underset{i^{\alpha} + 1}{\sum\limits^{PM}}{\rho_{i}{\overset{\rightarrow}{d}}_{i}}}} = {{G{{{\rho\;}_{i^{\alpha} + 1}{\overset{\rightarrow}{d}}_{i^{\alpha} + 1}}}} + {{\underset{i^{\alpha} + 2}{\sum\limits^{PM}}{\rho_{i}{\overset{\rightarrow}{d}}_{i}}}}}$ for some constant 0<G<1 (since the innovation components are not orthogonal). Assuming the convergence is fairly well behaved approximate

${{{}\rho_{i^{a}}} + {1{\overset{\rightarrow}{d}}_{i^{a}}} + {1{}}} \approx {{{}{\overset{\rightarrow}{h}}_{B}^{a}} - {{\overset{\rightarrow}{h}}_{B}^{mle}{}{\left( {1 - \omega} \right).}}}$ A common scenario is that of a white non-fading source signal with variance ζ². The variance of each coefficient w_(j,k,l,m) is simply δ_(J) ²=Δ_(j,k)=σ²/ζ²×B/F. With PM=BTFτ_(max) coefficients this implies {right arrow over (w)}_(J) has variance P×Bτ_(max)/SNR. In this scenario, with G=10 iterate while ∥ρ_(i+1){right arrow over (d)}_(i+1)∥>10(1−ω)PBτ_(max)/SNR. A conservative approximation of κ=1000 corresponding to ω=0.93 can be made.

TABLE 1 ${Computation}\mspace{14mu}{of}\mspace{14mu}{\overset{\sim}{\overset{\rightharpoonup}{w}}}_{J}\mspace{14mu}{via}\mspace{14mu}{conjugate}\mspace{14mu}{{gradient}.}$ P = BT, M = Fτ_(max), N_(c) = F/B m₀ ∝ length(g^(ξ)) Initialize ${\overset{\rightharpoonup}{x}}_{0} = {\left( {D_{\xi}^{1} \otimes I_{K}} \right){\overset{\sim}{\overset{\rightharpoonup}{h}}}_{B/2}^{0}}$ O(PM × m_(o) ²) $C_{B},{{\overset{\rightharpoonup}{b}}_{B} = {S_{B}^{\prime}\overset{\rightharpoonup}{r}}}$ O(PN_(c)logN_(c)) ${{\overset{\rightharpoonup}{d}}_{0} = {\overset{\rightharpoonup}{e}}_{0}},{{\overset{\rightharpoonup}{e}}_{0} = {\overset{\rightharpoonup}{b} - {C_{B}{\overset{\rightharpoonup}{x}}_{0}}}}$ ${\overset{\rightharpoonup}{f}}_{0} = {C_{B}{\overset{\rightharpoonup}{d}}_{0}}$ O(PM²) ${{while}\mspace{14mu}{{\rho_{i}{\overset{\rightharpoonup}{d}}_{i}}}^{2}} > {\left( {1 - \omega} \right){PB}\;\tau_{\max}\delta_{J}^{2}}$ ${\eta = {{\overset{\rightharpoonup}{e}}_{i}^{\prime}{{\overset{\rightharpoonup}{f}}_{i}/{\overset{\rightharpoonup}{d}}_{i}^{\prime}}{\overset{\rightharpoonup}{f}}_{i}}},$ ${\overset{\rightharpoonup}{d}}_{i + 1} = {{\overset{\rightharpoonup}{e}}_{i} - {\eta{\overset{\rightharpoonup}{d}}_{i}}}$ O(PM) ${\overset{\rightharpoonup}{f}}_{i + 1} = {C_{B}{\overset{\rightharpoonup}{d}}_{i + 1}}$ O(PM²) ${\rho_{i + 1} = {{\overset{\rightharpoonup}{e}}_{i}^{\prime}{{\overset{\rightharpoonup}{d}}_{i + 1}/{\overset{\rightharpoonup}{d}}_{i + 1}^{\prime}}{\overset{\rightharpoonup}{f}}_{i + 1}}},$ ${\overset{\rightharpoonup}{v}}_{i + 1} = {{\overset{\rightharpoonup}{v}}_{i} + {\rho_{i + 1}{\overset{\rightharpoonup}{f}}_{i + 1}}}$ O(PM) ${{\overset{\rightharpoonup}{x}}_{i + 1} = {{\overset{\rightharpoonup}{x}}_{i} + {\rho_{i + 1}{\overset{\rightharpoonup}{d}}_{i + 1}}}},$ ${\overset{\rightharpoonup}{e}}_{i + 1} = {\overset{\rightharpoonup}{b} - {\overset{\rightharpoonup}{v}}_{i + 1}}$ O(PM) i = i + 1 end ${\overset{\sim}{\overset{\rightharpoonup}{h}}}_{B} = {\overset{\rightharpoonup}{x}}_{i + 1}$ ${\overset{\sim}{\overset{\rightharpoonup}{w}}}_{J} = {U_{J}{\overset{\rightharpoonup}{x}}_{i + 1}}$ O(PM × m_(o) ²)

The Gaussian (normal) mixture can be chosen as a sparsity prior for the doppler spread channel. The previous maximum likelihood in-scale estimate of the channel assumes that all delay lags of the channel at each time contribute to the output a priori equally. Synonymously there is in the likelihood model an implicit prior of large and equal variance for each and every time-delay component of the channel operator. This assumption is simplistic for sparse channels and a modification to the likelihood approach is warranted. The prior chosen should give the model flexibility to choose components, associated with frequency selectivity and Doppler spread (time modulation) that are significant and reject those that are not. This problem is synonymous with variable selection and mixture prior assignments have been useful for such problems in both the Bayesian and empircal Bayes methodologies. For this reason the binary discrete mixture normal model p(w|π,λ,␣)=M(π,λ,␣)

w|π,λ,␣˜πN _(w)(0,λ²)+(1−π)N _(w)(0,∈²)  (13) is chosen as a prior. Specify the prior conditional density of {right arrow over (w)}_(J) as

$\begin{matrix} {{{\overset{\rightarrow}{w}}_{J}\text{❘}\pi_{j,k}},\lambda_{j,k},{\left. \varepsilon_{j,k} \right.\sim{\prod\limits_{j,k}^{J,K}{\prod\limits_{l,m}^{L_{j},M_{k}}{{M_{w_{j,k,l,m}}\left( {\pi_{j,k},\lambda_{j,k},\varepsilon_{j,k}} \right)}.}}}}} & (14) \end{matrix}$ The mixture normal model can be useful because it captures an attribute of sparsity. For each of the significant arrival paths associated with an operator we have a normal model. For those delays for which no arrival energy is present we model these with small (∈) variance. The hyperparameters necessary to specify the model are summarized in a single parameter {right arrow over (φ)}_(J)={π_(j,k), λ_(j,k), ∈_(j,k)}_(j,k≦J,K). With the prior and likelihood specified the resulting posterior distribution to be maximized jointly over {right arrow over (w)}_(J) and J (synonymous with {right arrow over (h)}_(B) and B) and hyperparameters {right arrow over (φ)}_(J) is p({right arrow over (w)}, {right arrow over (φ)},J|{right arrow over (r)},{right arrow over (s)}, σ²)∝N_({right arrow over (w)}) _(J) ({right arrow over ({tilde over (w)}_(J)({right arrow over (r)},{right arrow over (s)}), Δ_(J)({right arrow over (s)},σ²))×p({right arrow over (w)}|{right arrow over (φ)},σ²,J)×p({right arrow over (φ)}|J)×p(J)  (15) The first term, the likelihood assumes independence of wavelet coefficients via the near diagonalizing property of U_(J) on the source covariance C_(B) ^(ξ) and the second term the prior, is predicated on this assumption.

For ease of notation let *=l, m represent an arbitrary wavelet coefficient at time-delay location associated with l, m according to the model of Equation (1) so that w_(j,k,*)=w_(j,k,l,m). In the spirit of empirical Bayes consider the posterior distribution of wavelet coefficients given all other parameters. This conditional density is shown in Equation 33 to be

$\begin{matrix} {\left. w_{j,k,*} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,\sigma^{2},{{{\left. {\overset{\rightarrow}{\phi}}_{J} \right.\sim{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}}{{??}_{w_{j,k,*}}\left( {{{\gamma\left( \lambda_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}},{{\gamma\left( \lambda_{j,k} \right)}\delta_{j,k}^{2}}} \right)}} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right){{??}_{w_{j,k,*}}\left( {{{\gamma\left( \varepsilon_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}},{{\gamma\left( \varepsilon_{j,k} \right)}\delta_{j,k}^{2}}} \right)}}}} & (16) \end{matrix}$ where the posterior mixture coefficient Π({right arrow over (w)}_(j,k,*)) and the gains γ(·) are defined in the derivation of Equation 33. This posterior (16) has the same mixture normal form as the prior (13). There are however two important distinctions, first the posterior mixture coefficents Π(·) are functions of each individual empirical coefficient {right arrow over (w)}_(j,k,l,m). Secondly the means of the mixture densities are not equal. For this reason the posterior density is not symetric and therefore its first moment ŵ=E[w|{tilde over (w)},{right arrow over (φ)}_(J)] is not synonymous with the argument maximizing (16).

Since the posterior is a mixture the mean is simple to assess as

$\begin{matrix} {\begin{matrix} {{\hat{w}}_{j,k,*} = {E\left\lbrack {\left. w_{j,k,*} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,\sigma^{2},{\overset{\rightarrow}{\phi}}_{J}} \right\rbrack}} \\ {= {\left\lbrack {{\Pi\;\left( {\overset{\sim}{w}}_{j,k,*} \right){\gamma\left( \lambda_{j,k} \right)}} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right){\gamma\left( \varepsilon_{j,k} \right)}}} \right\rbrack{\overset{\sim}{w}}_{j,k,*}}} \end{matrix}.} & (17) \end{matrix}$ The resulting operator

${\overset{\rightarrow}{w}}_{J} = {E\left\lbrack {{\overset{\rightarrow}{w}❘{\overset{\overset{\sim}{\rightarrow}}{w}}_{J}},\sigma^{2},{\overset{\rightarrow}{\phi}}_{J}} \right\rbrack}$ attenuates smaller coefficients and leaves larger ones relatively unchanged. It is similar in form to a Wiener filter with the noticeable difference of a mixture of Wiener gains, each gain being proportional to the implicit empirical posterior probability of the associated model in the mixture.

The posterior mean of the channel {right arrow over (h)}_(B) sampled at the rate 2^(q)B Hz follows directly from (17) and the linearity of the wavelet transform as {right arrow over (ĥ)} _(B) =E[{right arrow over (h)}|{right arrow over (r)}, {right arrow over (s)}, σ ²,{right arrow over (φ)}_(J)]=(D ^(q) {circle around (×)}I ₂ _(K) )U _(J){right arrow over (ŵ)}_(J)

Ĥ _(B)=Ψ_(K) [{right arrow over (ŵ)} _(J)]^(ist)Ξ′_(J) D _(ξ) ^(−q).  (18) The posterior mean is useful since for the ensemble of channels associated with the mixture prior on wavelet coefficients no other estimator has smaller mean square error. It is however a biased estimator. The MAP estimate for the posterior is computationally intensive requiring iterations over every wavelet coefficient and for this reason an approximate MAP is worth considering.

$\begin{matrix} {w_{j,k,*}^{amap} = \left\{ \begin{matrix} {{\gamma\left( \lambda_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}} & {{{if}\mspace{14mu}{\Pi\left( {\overset{\sim}{w}}_{j,k,l,m} \right)}} > {1/2}} \\ {{\gamma\left( \varepsilon_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}} & {{{if}\mspace{14mu}{\Pi\left( {\overset{\sim}{w}}_{j,k,l,m} \right)}} < {1/2}} \end{matrix} \right.} & (19) \end{matrix}$ is a simple approximation and has been tested against the actual MAP, computed via Nelder-Mead iteration, and gives comparable performance in terms of MSE on simulated channels.

The conditional variance of the channel {right arrow over (h)}_(B) is computable directly from the posterior density (16) and gives measure of the uncertainty associated with the channel after observation of the data. The derivation of Equation 34 presents proof of the following

$\begin{matrix} \begin{matrix} {v_{j,k,*}^{2} = {{var}\left\lbrack {\left. w_{j,k,*} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,{\overset{\rightarrow}{\phi}}_{J}} \right\rbrack}} \\ {= {{\left( {{{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}{\gamma\left( \lambda_{j,k} \right)}} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right){\gamma\left( \varepsilon_{j,k} \right)}}} \right)\delta_{j,k}^{2}} +}} \\ {= {{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right)\left( {{\gamma\left( \lambda_{j,k} \right)} - {\gamma\left( \varepsilon_{j,k} \right)}} \right)^{2}{{\overset{\sim}{w}}_{j,k,*}^{2}.}}} \end{matrix} & (20) \end{matrix}$ The above shows that the conditional variance for each coefficient is the sum of two components. The first is a weighted average of the variances associated with the two models of the mixture. The last component represents the uncertainty of the coefficient being associated definitively with either of the two models in the mixture and is maximum for Π(w)=½. For this reason large variance is associated with coefficients with magnitude approximately

${\overset{\sim}{w}}_{j,k,*} = {\pm {\sqrt{\frac{\left( {\lambda_{j,k}^{2} + \delta_{j,k}^{2}} \right)\left( {\varepsilon_{j,k}^{2} + \delta_{j,k}^{2}} \right)}{\lambda^{2} - \varepsilon^{2}}{\log\left( {\frac{\left( {\lambda_{j,k}^{2} + \delta_{j,k}^{2}} \right)}{\left( {\varepsilon_{j,k}^{2} + \delta_{j,k}^{2}} \right)}\frac{\left( {1 - \pi_{j,k}} \right)^{2}}{\pi_{j,k}^{2}}} \right)}}.}}$ Define {right arrow over (v)}_(J) as the vector of posterior variances (20) associated with {right arrow over (w)}_(J); then under the assumption that the wavelet coefficients are uncorrelated, cov[{right arrow over (w)}_(J)]=Diag[{right arrow over (v)}_(J)]. The posterior covariance of {right arrow over (h)}_(B) is then, using (3) and paragraph 58, property 2,

$\begin{matrix} \begin{matrix} {\Gamma_{B} = {{cov}\left\lbrack {\overset{\rightarrow}{h}}_{B} \right\rbrack}} \\ {= {\left( {D_{\xi}^{q} \otimes I_{2^{\kappa}}} \right)U_{J}^{\prime}{{Diag}\left( {\overset{\rightarrow}{v}}_{J} \right)}{U_{J}\left( {D_{\xi}^{- q} \otimes I_{2^{\kappa}}} \right)}}} \\ {= {\left( {D_{\xi}^{q}{\Xi_{J} \otimes \Psi_{K}}} \right){{Diag}\left( {\overset{\rightarrow}{v}}_{J} \right)}{\left( {\Xi_{J}^{\prime}{D_{\xi}^{- q} \otimes \Psi_{K}^{\prime}}} \right).}}} \end{matrix} & (21) \end{matrix}$ Determine the variance of H_(B) at any give t, τ by computing V_(B)=[diag(Γ_(B))]^(ist) using paragraph 58 to express diag(Γ_(B))=[(D_(ξ) ^(q)Ξ_(J){circle around (·)}D_(ξ) ^(q)Ξ_(J)){circle around (×)}(Ψ_(K){circle around (·)}Ψ_(K))]{right arrow over (v)}_(J). This implies that the posterior variance field V_(B) defined over t, τ is V _(B)=(Ψ_(K){circle around (·)}Ψ_(K))[{right arrow over (v)}_(J)]^(ist)(D _(ξ) ^(q)Ξ_(J) {circle around (·)}D _(ξ) ^(q)Ξ_(J)).  (22) Since the operators (Ψ_(K){circle around (·)}Ψ_(K)) and (D_(ξ) ^(q)Ξ_(J){circle around (·)}D_(ξ) ^(q)Ξ_(J)) do not obey a scaling recursion they must be computed directly an impractical feat for channels of high dimension BT×Fτ_(max). For this reason a fast approximate (1−α)% confidence interval for H_(B) is worth pursuing. Proceed as follows; define

${\overset{\rightarrow}{w}}_{J}^{\alpha} = {{\overset{\hat{\rightarrow}}{w}}_{J} + {z_{\alpha}\sqrt{v_{J}}\mspace{14mu}{and}\mspace{14mu}{\overset{\rightarrow}{w}}_{J}^{1 - \alpha}{\overset{\hat{\rightarrow}}{w}}_{J}} - {z_{\alpha}\sqrt{{\overset{\rightarrow}{v}}_{J}}}}$ and transform via (3). Here of course P[z>z_(α)]=α/2, with z˜N(0, 1). These wavelet domain bounds produce an approximate EB high probability region for the channel as

$\begin{matrix} {{{P\begin{bmatrix} {{{\min\left( {{{\hat{H}}_{B}^{\alpha}\left( {t,\tau} \right)},{{\hat{H}}_{B}^{1 - \alpha}\left( {t,\tau} \right)}} \right)} <}\mspace{85mu}} \\ {{H\left( {t,\tau} \right)} < {\max\left( {{{\hat{H}}_{B}^{\alpha}\left( {t,\tau} \right)},{{\hat{H}}_{B}^{1 - \alpha}\left( {t,\tau} \right)}} \right)}} \end{bmatrix}} \approx {1 - \alpha}}{{\begin{matrix} {{\hat{H}}_{B}^{\alpha} = {{\Psi_{K}\left\lbrack {\overset{\rightarrow}{w}}_{J}^{\alpha} \right\rbrack}^{ist}\Xi_{J}^{\prime}D_{\xi}^{- q}}} & {{{\hat{H}}_{J}^{1 - \alpha} = \Psi_{K}}\;} \end{matrix}\left\lbrack {\overset{\rightarrow}{w}}_{J}^{1 - \alpha} \right\rbrack}^{ist}\Xi_{J}^{\prime}{D_{\xi}^{- q}.}}} & (23) \end{matrix}$

In keeping with the empirical Bayes methodology, estimate the hyperparameters {right arrow over (φ)}_(J={π) _(j,k),λ_(j,k),∈_(j,k)} as follows: first for π_(j,k) use Donoho's threshold argument, namely that z_(i)˜N(0, δ²) implies that lim_(n→∞) P[max_(i≦n) |z _(i)|>δ√{square root over (log(2πn))}]=0. Let {right arrow over (1)}_(S)(x)={1 for x∈S, 0 otherwise}. With the total number of coefficients at the j,k^(th) scale denoted N_(j,k)=2^(K−k+J−j)BTFτ_(max) then an asymptotically unbiased estimate of π_(j,k) when λ_(j,k)>>δ_(j,k) for the model of (13) is

${\hat{\pi}}_{j,k} = \frac{\sum{\overset{\rightarrow}{1}}_{S_{j,k}}}{N_{j,k}}$ where $S_{j,k} = \left\{ {w_{j,k,l,m}:{{w_{j,k,l,m}} > {\delta_{j,k}\sqrt{\log\left( {2\pi\; N_{j,k}} \right)}}}} \right\}$ For λ_(j,k) use {circumflex over (λ)}_(j,k)=max_(*)|{right arrow over (w)}_(j,k,*)|/3. Lastly for the ∈'s use the assumptions that the channel w_(j,k,*) and noise processes are independent with π<<1/2 the median absolute deviation (MAD) estimator of zero-mean iid (independent identically distributed) variates suggests E²[mad_(*){right arrow over (w)}_(j,k,*)]≈δ_(j,k) ²+∈_(j,k) ² leading to the estimator {circumflex over (∈)}_(j,k) ²=max[δ_(j,k) ²/20, [mad_(*){right arrow over (w)}_(j,k,*)]²−δ_(j,k) ²].

In this and the following two paragraphs, inference regarding Doppler spread B or maximum scale J=log₂BT is shown to be synonymous with model order selection. An EB-MAP estimate as well as rules based on Laplace's approximation and Akaike (AIC) are provided. It is convenient in keeping with the in-scale paradigm to place a prior on B synonymous with an uninformative prior on J, the maximum scale parameter. With this assumption the MAP estimate of the Doppler spread is determined by maximizing the likelihood p({right arrow over (r)}|J); the objective then is

${\hat{J} = {\arg\;{\min_{J}\left\{ {{- \log}\;{p\left( {\overset{\rightarrow}{r}❘J} \right)}} \right\}}}},{{p\left( {{\overset{\rightarrow}{r}❘B} = {2^{J}\text{/}T}} \right)} = {\int{{p\left( {\overset{\rightarrow}{r}❘{\overset{\rightarrow}{w}}_{J}} \right)}{p\left( {{\overset{\rightarrow}{w}}_{J}❘{\overset{\rightarrow}{\phi}}_{J}} \right)}{p\left( {{\overset{\rightarrow}{\phi}}_{J}❘J} \right)}{\mathbb{d}\overset{\rightarrow}{w}}{{\mathbb{d}\overset{\rightarrow}{\phi}}.}}}}$ The first term in the integrand is decomposed as

${p\left( {\overset{\rightarrow}{r}❘{\overset{\rightarrow}{w}}_{J}} \right)} = {\left( \sqrt{2\pi} \right)^{2^{J}{F\tau}_{mzx}}{\Delta_{J}}^{\frac{1}{2}} \times {N_{\overset{\rightarrow}{r}}\left( {{{\overset{\Cup}{H}}_{B}^{mle}\overset{\rightarrow}{s}},{I\;\sigma^{2}}} \right)} \times {N_{{\overset{\rightarrow}{w}}_{J}}\left( {{\overset{\overset{\sim}{\rightarrow}}{w}}_{J},\Delta_{j}} \right)}}$ leading to

${p\left( {\left. \overset{\rightarrow}{r} \middle| B \right. = {2^{J}/T}} \right)} = {{{??}_{\overset{\rightarrow}{r}}\left( {{{\overset{\Cup}{H}}_{B}^{mle}\overset{\rightarrow}{s}},{I\;\sigma^{2}}} \right)} \times \left( {2\pi} \right)^{{BTF}\;{\tau_{\max}/2}}{\Delta_{J}}^{1/2} \times {\int{\prod\limits_{j,k,l,m}\;{\left( {{{??}_{w_{j,k,l,m}}\left( {{\overset{\sim}{w}}_{j,k,l,m},\delta_{j,k}^{2}} \right)} \times \left\lbrack {{\pi_{j,k}{{??}_{w_{j,k,l,m}}\left( {0,\lambda_{j,k}^{2}} \right)}} + {\left( {1 + \pi_{j,k}} \right){{??}_{w_{j,k,l,m}}\left( {0,\varepsilon_{j,k}^{2}} \right)}}} \right\rbrack{p\left( \overset{\rightarrow}{\phi} \middle| J \right)}} \right){\mathbb{d}\overset{\rightarrow}{w}}{\mathbb{d}\overset{\rightarrow}{\phi}}}}}}$ Integrate over {right arrow over (w)} analytically since N_(w)({right arrow over (w)}, δ²)N_(w)(0, λ²)=N_(w)(γ(λ){right arrow over (w)}, γ(λ)δ²)N_({right arrow over (w)})(0, δ²+λ²) and over the 3J K parameters of {right arrow over (φ)}_(J) using the asymptotic result of Schwarz (i.e. ∝ (FT)^(−3JK/2)). Under the near diagonalizing property of U_(J) approximate |Δ_(J)|≈Π_(j,k) δ_(j,k2), and with the EB estimate of {right arrow over (φ)}_(J) and {right arrow over ({tilde over (e)}_(r,B)=r−{hacek over (H)}_(B) ^(mle){right arrow over (s)}, the MAP criterion is {circumflex over (B)} _(MAP) =argmin_(J) {Q _(MAP)(J)}  (25) where

$\begin{matrix} {{Q_{MAP}(J)} = {\frac{{\overset{\sim}{\overset{\rightarrow}{\mathbb{e}}}}_{r,B}^{\prime}{\overset{\sim}{\overset{\rightarrow}{\mathbb{e}}}}_{r,B}}{2\sigma^{2}} - {\frac{\left( {2^{J}F\;\tau_{\max}} \right)}{2}{\log\left( {2{\pi\delta}_{j,k}^{2}} \right)}} + {\frac{3{JK}}{2}\log\mspace{11mu}{FT}} - {\sum\limits_{j,k,l,m}^{J,K}{{\log\left( {\mathcal{M}_{{\overset{\sim}{w}}_{j,k,l,m}}\left( {{\hat{\pi}}_{j,k},{\hat{\lambda}}_{j,k}^{2},{+ \;\delta_{j,k}^{2}},{{\hat{\varepsilon}}_{j,k} + \delta_{j,k}^{2}}} \right)} \right)}.}}}} & (26) \end{matrix}$ It is somewhat striking and counterintuitive that this empirical Bayes MAP estimate of Doppler spread (25) does not require the computation of the MAP or posterior mean (PM), the residual errors being associated with that of the MLE.

To consider rules based on the residual errors associated with the posterior mean or approximate MAP channel estimate proceed as follows; let lp({right arrow over (w)})=log p({right arrow over (r)}|{right arrow over (w)})p({right arrow over (w)}|{right arrow over (φ)}, J) (noting that this is proportional to the log posterior of {right arrow over (w)} expand about {right arrow over (w)}^(map); lp({right arrow over (w)})≈lp({right arrow over (w)}^(map))+({right arrow over (w)}−{right arrow over (w)}^(map))′lp″({right arrow over (w)}^(map))({right arrow over (w)}−{right arrow over (w)}^(map))/2. Approximate lp″({right arrow over (w)}^(map))≈−V⁻¹ and substitute the posterior mean {right arrow over (ŵ)}_(J) and covariance V_(B) yielding

${p\left( {\overset{\rightarrow}{r}❘J} \right)} \approx {{p\left( {\overset{\rightarrow}{r}❘{\overset{\hat{\rightarrow}}{h}}_{B}} \right)}{{2\pi}}^{{BTF\tau}_{\max}/2}{V_{B}}^{\frac{1}{2}}{\int{{p\left( {{{\overset{\hat{\rightarrow}}{w}}_{J}❘{\overset{\rightarrow}{\phi}}_{J}},J} \right)}{p\left( {\overset{\rightarrow}{\phi}❘J} \right)}{{\mathbb{d}\overset{\rightarrow}{\phi}}.}}}}$ With |V_(B)|^(1/2)=Π{right arrow over (v)}_(J) ^(1/2) and the asymptotic result of Scharwz applied to {right arrow over (φ)}_(J) the resulting Laplace approximation rule is {circumflex over (B)} _(LAP) =argmin_(J) {Q _(LAP)(J)}  (27) where

$\begin{matrix} {{Q_{LAP}(J)} = {{{- \log}\mspace{11mu}{p\left( \overset{\rightarrow}{r} \middle| {\hat{\overset{\rightarrow}{w}}}_{J} \right)}} - {\log\mspace{11mu}{p\left( \hat{\overset{\rightarrow}{w}} \middle| J \right)}} + {\frac{3{JK}}{2}\log\;{FT}} - {\frac{1}{2}{\sum\limits_{j,k,l,m}{\log\; 2\pi\;{v_{j,k,l,m}^{2}.}}}}}} & (28) \end{matrix}$ The first term

${- \log}\;{p\left( {\overset{\rightarrow}{r}❘{\overset{\hat{\rightarrow}}{w}}_{J}} \right)}$ is a measure of the quality of the posterior mean channel estimate to predict the data {right arrow over (r)} and is synonymous with the coding complexity of the data given the estimate of the channel. From this perspective it is proportional to the length of the best code of the residuals of the data predicted by the adaptive filter estimate and source signal. Similarly the sum of the second and third terms is proportional to the information necessary to specify this particular channel estimate {right arrow over (ĥ)}_(B) from the prior assignment to the resolution associated with the posterior variance.

A popular and heuristically simple alternative is Akaike's Information Criteria (AIC) (for a reasonable explanation of this ad-hoc measure see Kay). The associated penalty cost in this case is simply one half the number of parameters of the model {circumflex over (B)} _(AIC=ar gmin) _(J) {Q _(AIC)(J)}  (29) where

$\begin{matrix} {{Q_{AIC}(J)} = {{{- \log}\mspace{11mu}{p\left( \overset{\rightarrow}{r} \middle| {\hat{\overset{\rightarrow}{w}}}_{J} \right)}} + \frac{3{JK}}{2} + {2^{J - 1}F\;{\tau_{\max}.}}}} & (30) \end{matrix}$

These in-scale algorithms were tested on a diverse set of simulated multipath channels to determine their relative performance in jointly estimating the channel operator and its Doppler spread.

The considered test channels are of the form

$\begin{matrix} {{H\left( {t,\tau} \right)} = {\sum\limits_{m = 1}^{M_{p}}{{\alpha_{m}(t)}{\chi_{m}\left( {\tau - {d_{m}(t)}} \right)}}}} & (31) \end{matrix}$ having M_(p) independent component paths. Each path has an associated arrival spreading function x_(m)(•) that is time invariant. Doppler spread is induced by both the path gain processes α_(m)(t) and the path delay processes d_(m)(t). Arrival path delay times d_(m)(t) and amplitudes α_(m)(t) are modeled as correlated Gaussian variates with respective covariances c_(d) _(m) (τ)=c_(d) _(m) (0)(1−τ/T_(d) _(m) ) for |t|<T_(d) _(m) , 0 otherwise and c_(α) _(m) (t)=c_(α) _(m) (0)(1−t/T_(α) _(m) for |t|<T_(α) _(m) , 0 otherwise. A realization of such a channel is then generated given the 3M_(d) parameters: the weight functions χ_(m)(·), and coherence times T_(d) _(m) and T_(α) _(m) . An associated Doppler bandwidth for each of these path processes is termed B_(α) _(m) =2/T_(α) _(m) and B_(d) _(m) =2/T_(d) _(m) . For each test case the Doppler spread is termed the maximum of these constituent Doppler bandwidths. Varying the correlation times associated with the processes d_(m)'s and α_(m)'s as well as the spreading shapes χ_(m)'s allows for the simulation of processes that are diverse and realistic.

Four channel operators were tested, the number of paths and the maximal Doppler spread for each of the cases is listed in Table 2.

TABLE 2 Test cases for time-varying channel parameters case M_(d) B (Doppler spread) LTI 4 0 Hz TV α's 4 1 Hz TV delay's 4 4 Hz TV delay's & α's 4 4 Hz

Some channels share a common feature of sparsity in that while each scatterer path exhibits Gaussian uncorrelated scattering each delay lag does not. Thus the channels are not wide sense stationary-uncorrelated scattering (WSS-US).

Gaussian source {right arrow over (s)} and additive noise signals of the same spectra where used in the simulations. The source signal correlation c_({right arrow over (s)})(τ)=E[s(t)s(t+τ)]=ζ²(1−2τF) for |τ|<2F, 0 otherwise. c_({right arrow over (n)})(τ)=σ²c_({right arrow over (s)})(τ)/ζ². The channels were simulated for durations of approximately T=8 seconds and a bandwidth F=2 k Hz.

In these simulations C_(B) and S_(B) were used as approximations to C_(B) ^(ξ) and S_(B) ^(ξ) in the CG algorithm for the computational savings associated with its tight banded structure. In practice the number of C G iterations to compute {right arrow over ({tilde over (h)}_(B) ^(a) starting from (D_(ξ) ¹{circle around (×)}I₂ _(K) ){tilde over ({right arrow over (h)}_(B/2) ^(a) hB/2 to a chosen precision of 0.1×P×δ_(J) ² is approximately 5. Further iterations are used to compute the MLE. The algorithm was implemented with Daubechies orthogonal wavelets order 3 (ψ=D(3)) in delay and ξ=D(5) in time for U_(J) under the folded wavelet assumption. The interpolation operator D_(ξ) ^(−q) of (3) in this simulation was implemented as a windowed sinc for the simplicity associated with its linear phase and q was set at 4 so that for a given hypothesized J the estimate was assumed LTI over durations of T/2^(J+4).

The normalized mean square error for the channel estimates

${e_{h}}_{2}^{2} = \left\lbrack \frac{\left( {{\hat{\overset{\rightarrow}{h}}}_{B} - {\overset{\rightarrow}{h}}_{B}} \right)^{\prime}\left( {{\hat{\overset{\rightarrow}{h}}}_{B} - {\overset{\rightarrow}{h}}_{B}} \right)}{{\overset{\rightarrow}{h}}_{B}^{\prime}{\overset{\rightarrow}{h}}_{B}} \right\rbrack$ is computed for each of the estimators; the approximate MLE

${\overset{\overset{\sim}{\rightarrow}}{h}}_{B}^{a}$ (aMLE), the MLE, {right arrow over (h)}_(B) ^(mle), the posterior mean

${\overset{\hat{\rightarrow}}{h}}_{B}$ (PM) and the approximate MAP.

Granular noise is present in the MLE and represents the cost incured by the use of this unbiased estimator. With the sparsity prior the PM is able to provide a more regular estimate that exhibits better MSE performance. However this performance comes at a cost of bias in the PM estimate. This bias is pronounced in the peaks of the channel response estimate. Indeed it is easily verified by considering near L_(∞) loss functions. Tests were conducted with ∥e_(h)∥_(p) p=8 to approximate L_(∞) loss and demonstrate that the MLE often outperforms the PM even for these sparse test cases.

To determine the ability of the in-scale approach to accurately determine Doppler spread the algorithm can be tested on an LTI operator. One important attribute of the in-scale approach is that LTI filters can be recognized even at quite low SNR. The in-scale algorithm can identify LTI channels with only the computation of 2 iterations in the scale domain. The first iteration is the LTI estimate. The second is the scale 1 iteration from which the LTI is compared favorably and the in-scale iteration is ceased. This is in stark contrast to in-time recursions where computational resources are distributed over time without respect to the actual coherence time of the channel process. Thus for finite duration signaling the in-scale approach has a distinct advantage. The Doppler complexity measure is a minimum for the J=0^(th) scale estimate for all three criteria (AIC, LAP, MAP) and does not fall below this at greater resolution estimates. Here each of the Doppler costs are normalized by E[log p({right arrow over (r)}|{right arrow over (h)}_(B))]=−(1+log σ²)FT/2. It is further noted that for sparse channels in delay the mixture normal model is an effective sparsing model affording a 2 dB improvement in performance over the MLE estimate at the 0^(th) scale estimate. The PM estimate exhibits greater gains at higher scales demonstrating the robustness of the posterior mean and MAP estimates against variance of the Doppler spread rule.

Monte Carlo simulations reveal slightly greater variability of the MAP stopping rule over that of the AIC or LAP. We can demonstrate that a model of the time varying filter as a process in scale yields estimators that provide joint estimation of channel response and coherence time. Coherence time is modeled in this framework as model order selection.

An MLE, approximate MLE, approximate MAP and posterior mean were derived along with AIC, Laplace approximation and MAP Doppler estimation procedures based on empirical Bayes assumptions. These algorithms were tested and compared with one and other on a number of simulated time varying channels. As expected the posterior mean demonstrates improvement in performance over the MLE and approximate MAP for time varying channels relative to mean square error.

All of the estimators presented for in-scale processing rely on the conjugate gradient algorithm to compute approximate MLE estimates from the previous lower scale estimate. From the MLE and conditional variance along with a sparsity prior the posterior mean is a simple adaptive thresholding of wavelet coefficents. In this framework computation is distributed over scale in the estimator and this contrasts with in-time Kalman methods that distribute computation in time for an assumed coherence time. Joint estimation of channel parameters and coherence time is fundamental to in-scale estimation as it provides a stopping rule at which additional computation is unlikely to provide an improved estimate. Lastly covariance estimates of the time varying model are provided and approximate high probability regions are shown to be easily accessible. These present useful computationally efficient empirical Bayes lower bounds on certainty bands of the channel operator.

The implications for multisensor and multichannel estimation are apparent. Computational resources can be automatically and seemlessly focused on system nodes that exhibit greater Doppler bandwidth. This implies two advantages, first channels that are more coherent will not waste computational resources. Secondly performance degradation is limited since overfitting of data associated with overly complex models is less likely.

Important issues relating to computation and performance relative to in-time recursions must be addressed. Future work should focus on comparison of in-scale recursions with in-time recursions (e.g. Kalman, RLS) to give bounds on signal duration, channel sparsity and coherence time measures where the in-scale regime is to be favored. The in-scale approach could be broadened to include other scale stepping increments rather than the classic octave band partition presented here. Mixed scale-time methods can also be envisioned for block processing over time.

The following matrix and Kronecker properties are useful.

Properties:

-   -   1. [A H B]^(st)=(B′{circle around (×)}A)[H]^(st)     -   2. (A C{circle around (×)}B D)=(A{circle around (×)}B)(C{circle         around (×)}D)     -   3. (A{circle around (·)}C){circle around (×)}(B{circle around         (·)}D)=(A{circle around (×)}B){circle around (·)}((C{circle         around (×)}D) where A_(M×N)C_(M×N) and B_(L×K) D_(L×K).     -   4. (A{circle around (×)}B)⁻¹=A⁻¹{circle around (×)}B⁻¹, and         (A{circle around (×)}B)′=A′{circle around (×)}B′     -   5. diag(A B)=[A{circle around (·)}B′]{right arrow over (1)}_(M)         where A_(M×N) and B_(N×M)     -   6. diag(A{circle around (·)}B)=diag(A){circle around (·)}diag(B)         where A_(N×N) and B_(N×N) and B_(L×LM/N)     -   7. A{right arrow over (b)}=A Diag({right arrow over (b)}){right         arrow over (1)}_(N)     -   8. A Diag({right arrow over (b)})=A{circle around (·)}({right         arrow over (b)}′{circle around (×)}{right arrow over (1)}_(M))         where A_(M×N) and {right arrow over (b)}_(N×1)

The following result is useful when computing the posterior variance of a time varying operator for: A_(K×L), C_(L×K), B_(M×N), D_(N×M), {right arrow over (v)}_(LN×1) if: V=(A{circle around (×)}B)Diag({right arrow over (v)})(C{circle around (×)}D) then: diag(V)=[(A{circle around (·)}C′){circumflex over (×)}(B{circle around (·)}D′)]{right arrow over (v)}  (32) Proof:

$\begin{matrix} V & = & {\left( {\left( {A \otimes B} \right) \odot \left( {v^{\prime} \otimes {\overset{\rightarrow}{1}}_{KM}} \right)} \right)\left( {C \otimes D} \right)} & {{property}\mspace{14mu} 8} \\ {{diag}(V)} & = & {\left\lbrack {\left( {A \otimes B} \right) \odot \left( {v^{\prime} \otimes {\overset{\rightarrow}{1}}_{KM}} \right) \odot \left( {C^{\prime} \otimes D^{\prime}} \right)} \right\rbrack{\overset{\rightarrow}{1}}_{L\; N}} & {{property}\mspace{14mu} 5} \\ \; & = & {\left\lbrack {\left( {\left( {A \otimes B} \right) \odot \left( {C^{\prime} \otimes D^{\prime}} \right)} \right){{Diag}\left( \overset{\rightarrow}{v} \right)}} \right\rbrack{\overset{\rightarrow}{1}}_{L\; N}} & {{{property}\mspace{14mu} 8}\;} \\ \; & = & {\left\lbrack {\left( {A \otimes B} \right) \odot \left( {C^{\prime} \otimes D^{\prime}} \right)} \right\rbrack\overset{\rightarrow}{v}} & {{property}\mspace{14mu} 7} \\ \; & = & {\left\lbrack {\left( {A \odot C^{\prime}} \right) \otimes \left( {B \odot D^{\prime}} \right)} \right\rbrack\overset{\rightarrow}{v}} & {{property}\mspace{14mu} 3} \end{matrix}$

The following form of the posterior distribution is useful. The prior and likelihood on wavelet coefficients are w_(j,k,*)|π_(j,k), λ_(j,k)˜π_(j,k)N_(w) _(j,k*) (0, λ_(j,k) ²)+(1−π_(j,k))N(0, ∈_(j,k) ²) and {tilde over (w)}_(j,k,*)|w_(j,k*)˜N_({tilde over (w)}) _(j,k,*) (w_(j,k,*), δ_(j,k) ²) respectively. The marginal density of empirical coefficients is then

${p\left( {\overset{\sim}{w}}_{j,k,*} \middle| {\overset{\rightarrow}{\phi}}_{J} \right)} = {{\int{{p\left( {\overset{\sim}{w}}_{j,k,*} \middle| w_{j,k,*} \right)}{p\left( {\left. w_{j,k,*} \middle| \pi_{j,k} \right.,\lambda_{j,k},\varepsilon_{j,k}} \right)}{\mathbb{d}w_{j,k,*}}}} = {{\pi_{j,k}{{??}_{{\overset{\sim}{w}}_{j,k,*}}\left( {0,{\lambda_{j,k}^{2} + \delta_{j,k}^{2}}} \right)}} + {\left( {1 - \pi_{j,k}} \right){{{??}_{{\overset{\sim}{w}}_{j,k},*}\left( {0,{\varepsilon_{j,k}^{2} + \delta_{j,k}^{2}}} \right)}.{Since}}}}}$ ${{{??}_{w_{j,k,*}}\left( {{\overset{\sim}{w}}_{j,k,*},\delta_{j,k}^{2}} \right)}{{??}_{w_{j,k},*}\left( {0,\lambda_{j,k}^{2}} \right)}} = {{{??}_{w_{j,k,*}}\left( {{{\gamma\left( \lambda_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}},{{\gamma\left( \lambda_{j,k} \right)}\delta_{j,k}^{2}}} \right)}{{??}_{{\overset{\sim}{w}}_{j,k,*}}\left( {0,{\lambda_{j,k}^{2} + \delta_{j,k}^{2}}} \right)}}$ where γ(λ)=λ²/(λ²+δ²) is akin to a Wiener gain, a direct application of Bayes theorem p(w|{tilde over (w)})=p({tilde over (w)}|w)p(w)/p({tilde over (w)}) confirms the posterior density to be

$\begin{matrix} {{\left. w_{j,k,*} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,{{{\left. {\overset{\rightarrow}{\phi}}_{J} \right.\sim{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}}{{??}_{w_{j,k,*}}\left( {{{\gamma\left( \lambda_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}},{{\gamma\left( \lambda_{j,k} \right)}\delta_{j,k}^{2}}} \right)}} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right){{??}_{w_{j,k,*}}\left( {{{\gamma\left( \varepsilon_{j,k} \right)}{\overset{\sim}{w}}_{j,k,*}},{{\gamma\left( \varepsilon_{j,k} \right)}\delta_{j,k}^{2}}} \right)}}}}{where}{{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)} = \left\lbrack {1 + \frac{1 - {\pi_{j,k}{{??}_{{\overset{\sim}{w}}_{j,k,*}}\left( {0,{\varepsilon_{j,l}^{2} + \delta_{j,k}^{2}}} \right)}}}{\pi_{j,k}{{??}_{{\overset{\sim}{w}}_{j,k,*}}\left( {0,{\lambda_{j,k}^{2} + \delta_{j,k}^{2}}} \right)}}} \right\rbrack^{- 1}}} & (33) \end{matrix}$

The following form of the variance is useful. For the variance, start with var[w_(j,k,*)|{tilde over (w)}_(j,k,*), {tilde over (φ)}_(J)]=E[w_(j,k,*) ²|{tilde over (w)}_(j,k,*), {right arrow over (φ)}_(J)]−E²[w_(j,k,*)|{right arrow over (w)}_(j,k,*), {right arrow over (φ)}_(J)]. For simplicity denote the modulus square of a coefficient by x²=x′x. From the posterior density the second moment E[w_(j,k,*,) ²|{tilde over (w)}_(j,k,*, {right arrow over (φ)}) _(J)] is easily evaluated as

${E\left\lbrack {\left. w_{j,k,*}^{2} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,{\overset{\rightarrow}{\phi}}_{J}} \right\rbrack} = {{{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}\left( {{{\gamma(\lambda)}\delta_{j,k}^{2}} + {{\gamma^{2}(\lambda)}{\overset{\sim}{w}}_{j,k,*}^{2}}} \right)} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right)\left( {{{\gamma(\varepsilon)}\delta_{j,k}^{2}} + {{\gamma^{2}(\varepsilon)}\delta_{j,k}^{2}} + {{\gamma^{2}(\varepsilon)}{\overset{\sim}{w}}_{j,k,*}^{2}}} \right)}}$ and of course E²[w_(j,k,*)|{right arrow over (w)}_(j,k,*), {right arrow over (φ)}_(J)] is taken from (17). The result after simplification is

$\begin{matrix} {{{var}\left\lbrack {\left. w_{j,k,*} \middle| {\overset{\sim}{w}}_{j,k,*} \right.,{\overset{\rightarrow}{\phi}}_{J}} \right\rbrack} = {{\left\lbrack {{{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}{\gamma(\lambda)}} + {\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right){\gamma(\varepsilon)}}} \right\rbrack\delta_{j,k}^{2}} + {{\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}{\left( {1 - {\Pi\left( {\overset{\sim}{w}}_{j,k,*} \right)}} \right)\left\lbrack {{\gamma(\lambda)} - {\gamma(\varepsilon)}} \right\rbrack}^{2}{{\overset{\sim}{w}}_{j,k,*}^{2}.}}}} & (34) \end{matrix}$ Since lim_(x→0) Π(x)=0 and lim_(x→∞) Π(x)=1, it is easily confirmed that for λ>>δ and ∈/δ<<1

lim var[w|{tilde over (w)}]=δ²

lim var[w|{tilde over (w)}]=∈²  (35) while the maximum variance attainable is approximately var[w|{tilde over (w)}={tilde over (w)}_(p)]≈δ²(2+log(λ²/δ²π²))/4 where

$\begin{matrix} \begin{matrix} {{\overset{\sim}{w}}_{p}^{2} = {\arg\left\{ {{\Pi\left( \overset{\sim}{w} \right)} = {1/2}} \right\}}} \\ {= {\frac{\left( {\lambda^{2} + \delta^{2}} \right)\left( {\varepsilon^{2} + \delta^{2}} \right)}{\lambda^{2} - \varepsilon^{2}}{\log\left( {\frac{\left( {\lambda^{2} + \delta^{2}} \right)}{\left( {\varepsilon^{2} + \delta^{2}} \right)}\frac{\left( {1 - \pi} \right)^{2}}{\pi^{2}}} \right)}}} \end{matrix} & (36) \end{matrix}$ 

1. A method for scale adaptive filtering to identify a relationship between a source signal and an actual received signal, said method comprising: calculating a scale wherein said scale includes a time delay of a channel operator of the actual received signal, and a frequency spread; estimating at least one wavelet coefficient; determining an estimated received signal at the scale using the wavelet coefficient; determining when the estimated received signal is sufficiently similar with respect to the actual received signal; and re-calculating the scale by a factor when the estimated received signal is not sufficiently similar with respect to the actual received signal.
 2. The method of claim 1, wherein said recalculating the scale by a factor includes doubling the factor.
 3. The method of claim 1, wherein determining when the estimated received signal is sufficiently similar to the actual received signal, said determining comprises: comparing the estimated received signal to the actual received signal; and determining when the nuisance parameters in the actual received signal is not greater than the estimated received signal.
 4. The method of claim 1, wherein determining when the estimated received signal is sufficiently similar to the actual received signal includes employing the Empirical Bayes Estimator.
 5. The method of claim 1, wherein said estimating at least one wavelet coefficient includes processing of a conjugate gradient.
 6. The method of claim 4, includes processing ${\overset{\overset{\sim}{\rightarrow}}{w}}_{J}$ via a conjugate gradient.
 7. A system for scale adaptive filtering operable to identify a relationship between a source signal and an actual received signal, said system comprises: a receiver operable to receive said actual received signal; and a processor operable to calculate a scale wherein said scale includes a time delay and a frequency spread, estimate at least one wavelet coefficient, determine an estimated received signal at the scale using the wavelet coefficient, determine when the estimated received signal is sufficiently similar with respect to the actual received signal, and re-calculate the scale by a factor when the estimated received signal is not sufficiently similar with respect to the actual received signal.
 8. The system of claim 6, wherein said processor operable to double the factor.
 9. The system of claim 6, wherein said processor operable to estimate the wavelet coefficient by processing a conjugate gradient.
 10. The system of claim 6, wherein said processor operable to process the wavelet coefficient by processing ${\overset{\overset{\sim}{\rightarrow}}{w}}_{J}$ via a conjugate gradient.
 11. A computer readable medium of instruction for adaptive filtering to identify a relationship between a source signal and an actual received signal, comprising: a first set of instruction adapted to calculate a scale wherein said scale includes a time delay and a frequency spread; a second set of instructions adapted to estimate at least one wavelet coefficient; a third set of instructions adapted to determine an estimated received signal at the scale using the wavelet coefficient; a fourth set of instructions adapted to determine when the estimated received signal is sufficiently similar with respect to the actual received signal; and a fifth set of instructions adapted to re-calculate the scale by a factor when the estimated received signal is not sufficiently similar with respect to the actual received signal.
 12. A computer readable medium of claim 10, further comprising: a sixth set of instructions adapted to recalculate the scale by the factor including doubling the factor.
 13. A computer readable medium of claim 10, further comprising: a seventh set of instructions adapted to estimate at least one wavelet coefficient including processing a conjugate gradient. 